Data Visualizations by County

Rows {data-width = 150}

Total Cases in TN

4,362

Negative Tests

52,256

Total Deaths

79

Column

Cases across time in most populous counties

Row {data-width = 650}

Cases per million residents

Column

Positive cases by county

All Cases by County

Data Visualizatons by Demographics

Column

Confirmed Cases by Age

Confirmed Cases by Sex

Column

Confirmed Cases by Race

Confirmed Cases by Ethnicity

About

The Tennessee Coronavirus Dashboard

The sole intention of this Coronavirus dashboard is to provide a visual overview of the 2019 Novel COVID-19 as it relates to counties in Tennessee. The data is scraped from two different sources, and there are no guarantees on the accuracy of the data becaues of differences in numbers reported and reporting time.

Data

Data for “Cases across time in most populous counties” is a concatenation of the New York Times Coronavirus Data, which last updated at 04-07, and the TN Department of Health, which updates daily at 2:00 PM CST.

All current (snapshot) data is from the TN Department of Health only.

One issue that arises from collecting data in this fashion are differences in the ways numbers are collected. The Tennessee department of health may be slightly behind other counties or report numbers differently. The most noticeable difference is a consistent ‘downard’ for the latest date in the “Cases across time” figure for Davidson County. The NYT acquires their county level data from the counties directly, and the Nashville/Davidson County updates are consistently greater than the TN department of health reports. This could be because the Davidson County data is not separated by in vs out of state patients and/or differences in “Total” vs “Active Cases”.

Created by Malle Carrasco-Harris.

---
title: "COVID-19 | Tennessee"
output:
    flexdashboard::flex_dashboard:
      orientation: rows
      vertical_layout: scroll
      social: menu
      source_code: embed
knit: (function(input_file, encoding) {
  out_dir <- 'docs';
  rmarkdown::render(input_file,
 encoding=encoding,
 output_file=file.path(dirname(input_file), out_dir, 'index.html'))})
---
  

```{r setup, include=FALSE}
library(flexdashboard)
library(readr)
library(ggplot2)
library(tidyverse)
library(dplyr)
#Acquire Data####
#Load NY Times Data###
nyt_path = 'https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv'

counties = read_csv(url(nyt_path)) #Originally contains all counties in US.

#Separate State
tn = counties[ which(counties$state =='Tennessee'),]

#Scrape the daily data from the TN Health Department. 
library(rvest)
tn_path = 'https://www.tn.gov/health/cedep/ncov.html'

tn_daily = tn_path %>% read_html() %>% html_nodes(xpath='/html/body/div[2]/div[2]/div[2]/div/div/div[4]/div/div/div/div/div/div/div[1]/div/div/div[2]/div/div/div/table') %>% html_table()

tn_daily = tn_daily[[1]]
names(tn_daily) = c('County', 'Positive', 'Negative', 'Death')
tn_daily$Positive= as.numeric(gsub(",", '', tn_daily$Positive)) #Remove commas; make numeric
tn_daily$Negative= as.numeric(gsub(",", '', tn_daily$Negative)) 
tn_daily[is.na(tn_daily)] = 0
tn_daily = tn_daily %>% separate('County', c('County', 'drop'),fill='right', extra='merge')
tn_daily$County = ifelse(tn_daily$County == 'Van',
                         "Van Buren",
                         tn_daily$County)
tn_daily$County = ifelse(tn_daily$County == 'Out' |tn_daily$County == 'OUT',
                         "Out of TN",
                         tn_daily$County)
tn_daily$County = ifelse(tn_daily$County == 'Grand'| tn_daily$County == 'TOTAL',
                         "Total",
                         tn_daily$County)
tn_daily$County = as.factor(tn_daily$County)




#Merge NYT and Tn Daily dataframes####
#Create a separate tn_daily that can fit into the format of tn
tn_daily2 = tn_daily[,c('County', 'Positive', 'Death')]
names(tn_daily2) = c('county', 'cases', 'deaths')
tn_daily2 = tn_daily2[!(tn_daily2$county=='Total' | tn_daily2$county =='Out of TN'),]
tn_daily2 = tibble::add_column(tn_daily2, state = 'Tennessee', .after='county')

fips_daily =tn %>% group_by(county, fips) %>% tally()

tn_daily2 = left_join(tn_daily2, fips_daily[,1:2], by ='county')
tn_daily2 = tibble::add_column(tn_daily2, date = as.Date(format(Sys.time(), '%Y-%m-%d')), .before='county')


##Row bind tn_daily (TN Health Dept) with tn
tn = rbind(tn, tn_daily2) #Rbind will automatically put the correct columns together. 


#Add population ####
#Get Population for counties in Tennessee
uscensus = 'https://raw.githubusercontent.com/mfcarrasco/COVID-TN-Counties/master/county_pop_2019.csv'
tn_pop = read_csv(url(uscensus))
tn_pop = tn_pop[ which(tn_pop$State =='Tennessee'),]
tn_pop = tn_pop[-1,c(2:3)]
tn_pop = tn_pop %>% separate('County', c('County', 'drop'),fill='right', extra='merge')
tn_pop$County = ifelse(tn_pop$County == 'Van',
                       "Van Buren",
                       tn_pop$County)
tn_pop$Population = as.numeric(tn_pop$Population)
tn_pop = tn_pop[, c('County', 'Population')]
names(tn_pop) = c('county', 'population')
#tn_pop[order(-tn_pop$population),]

##Combine tn (NYT) dataframe with Population
tn = left_join(tn, tn_pop, by='county')
tn$county = as.factor(tn$county)

#Calculate per million
tn['cases_per_million'] = (tn$cases/tn$population)*10^6


#Clean the global environment###
rm(list=ls()[!ls() %in% c('tn', 'tn_daily', 'counties')])
```

Data Visualizations by County
=======================================

Rows {data-width = 150}
-----------
### Total Cases in TN

```{r}
total_cases = tn_daily[which(tn_daily$County =='Total'),'Positive'] %>% formattable::comma(digits=0)

valueBox(value = total_cases, icon='fa-user-plus', color='#002D65')
```

### Negative Tests 

```{r} 
total_negatives = tn_daily[which(tn_daily$County =='Total'),'Negative'] %>% formattable::comma(digits=0)

valueBox(value = total_negatives, icon='fa-user-minus', color='#CC0000')
```

### Total Deaths

```{r} 
total_death = tn_daily[which(tn_daily$County =='Total'),'Death'] %>% formattable::comma(digits=0)

valueBox(value = total_death, color='#002D65')
```

Column {data-width=650}
-----------------------------------------------------------------------

### Cases across time in most populous counties

```{r}
library(plotly)
tn_top =c('Shelby', 'Davidson', 'Knox', 'Hamilton', 'Rutherford', 'Williamson')
tn_top = tn[ tn$county %in% tn_top,]


t_line = tn_pop_line =ggplot(data=tn_top, aes(x=date, y=cases, color=county))+
  geom_line(size=1)+
  scale_x_date(expand = c(0,0), date_breaks = '2 day', date_labels = '%b %d')+
  labs(x='', y='Cases')+theme(legend.title = element_blank(), panel.background = element_blank(), axis.line.x=element_line(), axis.line.y.left = element_line(), axis.text=element_text(face='bold'),axis.text.x = element_text(angle=45, hjust=1))
ggplotly(t_line)
```

Row {data-width = 650}
-------------------------

### Cases per million residents

```{r, fig.width=10, fig.height=5}
library(usmap)
library(viridis)
tn_geo =tn %>% group_by(county) %>% top_n(1,date)
tn_geo = tn_geo[!(tn_geo$county =='Unknown' | tn_geo$county =='Out of TN'|tn_geo$county =='Pending'|tn_geo$county =='PENDING'),]


tn_geo$fips =fips(state = 'TN', county=tn_geo$county) #get missing fips values for map

plot_usmap(include='TN', regions =  'counties',
           data=tn_geo, values='cases_per_million')+
  labs(title="COVID Cases per Million in Tennessee",
       subtitle = "Based on NYTimes Github Data & \nCensus 2019 Estimates")+
  scale_fill_viridis(name='Cases per million')+
  theme(legend.position = 'right')+
  labs(caption = format(Sys.time(), "%D"))
```

Column {data-width=350, data-height=950, .tabset}
-----------------------------------------------------------------------

### Positive cases by county

```{r}
tn_cases = tn_daily[which(tn_daily$Positive != 0 & tn_daily$County != 'Total' & tn_daily$County != 'Pending' & tn_daily$County != 'PENDING' & tn_daily$County != 'Out of TN'), c('County', 'Positive','Negative','Death')] #Remove where there are no cases

plot_ly(data=tn_cases,
        x=tn_cases$Positive,
        y=reorder(tn_cases$County, tn_cases$Positive),
        type='bar',
        orientation='h', 
        marker= list(color='red')) %>%
  layout(xaxis = list(title= 'Count', 
                      zeroline = FALSE, 
                      showline = F, 
                      showticklabels = T, 
                      showgrid = T),
         yaxis = list(showgrid = FALSE, 
                      showline = FALSE, 
                      showticklabels = TRUE,
                      dtick=1,
                      tickfont = list(size=10)))
```

### All Cases by County

```{r}
plot_ly(data=tn_cases,
        x= reorder(tn_cases$County, tn_cases$Negative),
        y=tn_cases$Negative,
        type='bar',
        name='Negative Cases',
        marker= list(color='darkblue')) %>%
          add_trace(y = tn_cases$Positive,
                    name='Positive Cases',
                    marker = list(color='yellow')) %>%
          add_trace(y = tn_cases$Death,
                    name='Deaths',
                    marker = list(color='red')) %>%
          layout(barmode = 'stack',
                 xaxis = list(showgrid = FALSE, 
                              showlilnee = FALSE, 
                              showticklabels = TRUE,
                              dtick=1,
                              tickfont =list(size=10)),
                 yaxis = list(title= 'Count', 
                              zeroline = FALSE, 
                              showline = F, 
                              showticklabels = T, 
                              showgrid = T),
                 hovermode = 'compare')
```

Data Visualizatons by Demographics
==================================

Column {data-width=350, data-height=450}
---------------------------

### Confirmed Cases by Age
```{r}
#Get TN Data
tn_path = 'https://www.tn.gov/health/cedep/ncov.html'

tn_age = tn_path %>% read_html() %>% html_nodes(xpath='/html/body/div[2]/div[2]/div[2]/div/div/div[3]/div/div/div/div/div[2]/div/div[1]/table') %>% html_table()

#Data Cleaning
tn_age = tn_age[[1]]
names(tn_age) = c('Age_Ranges', 'Count')
tn_age = tn_age[which(tn_age$Age_Ranges != 'Pending'& tn_age$Age_Ranges !='Pending' & tn_age$Age_Ranges != 'Total'),]
tn_age$Age_Ranges = as.factor(tn_age$Age_Ranges)
tn_age$Count = as.integer(tn_age$Count)
tn_age$Percent = round((tn_age$Count/sum(tn_age$Count))*100,1)
#Plot
ggplot(data=tn_age, aes(x=Age_Ranges, y=Count))+
  geom_col(fill='#002D65')+
  xlab('')+
  theme(panel.background = element_blank(), axis.line = element_line(), axis.text = element_text(face = 'bold', size = 12))+
  geom_text(aes(label =paste(Percent, '%' )), vjust=-0.2)
```

### Confirmed Cases by Sex
```{r}
#Get TN Data
tn_path = 'https://www.tn.gov/health/cedep/ncov.html'

tn_sex = tn_path %>% read_html() %>% html_nodes(xpath='/html/body/div[2]/div[2]/div[2]/div/div/div[3]/div/div/div/div/div[4]/div/div[2]/table') %>% html_table()

#Data Cleaning
tn_sex = tn_sex[[1]]
names(tn_sex) = c('Sex', 'Count', 'Percent')
tn_sex$Sex = as.factor(tn_sex$Sex)
tn_sex$Count = as.numeric(gsub(',','',tn_sex$Count))

#Plot
ggplot(data=tn_sex, aes(x=Sex, y=Count))+
  geom_col(fill='#CC0000', width=.5)+
  xlab('')+
  theme(panel.background = element_blank(), axis.line = element_line(), axis.text = element_text(face = 'bold', size = 12))+
  geom_text(aes(label =paste(Percent, '%' )), vjust=-0.2)
```



Column {data-width=350, data-height=450}
---------------------------

### Confirmed Cases by Race
```{r}
#Get TN Data
tn_path = 'https://www.tn.gov/health/cedep/ncov.html'

tn_race = tn_path %>% read_html() %>% html_nodes(xpath='/html/body/div[2]/div[2]/div[2]/div/div/div[3]/div/div/div/div/div[3]/div/div[1]/table') %>% html_table()

#Data Cleaning
tn_race = tn_race[[1]]
names(tn_race) = c('Race', 'Count', 'Percent')
tn_race$Race = as.factor(tn_race$Race)
tn_race$Count = as.numeric(gsub(",", '', tn_race$Count))

# #Plot
# ggplot(data=tn_race, aes(x=reorder(Race, Count), y=Count))+
#   geom_col(fill='#CC0000')+
#   xlab('')+
#   theme(panel.background = element_blank(), axis.line = element_line(), axis.text = element_text(face = 'bold', size = 12), axis.text.x = element_text(angle=15, hjust=1, size=12))+
#   geom_text(aes(label =paste(Percent,'%')), vjust=-0.2)


#Donut chart
tn_race$fraction = tn_race$Count/sum(tn_race$Count) #Compute percentages
tn_race$ymax = cumsum(tn_race$fraction) #compute the cumulative percentages (top of each rectangle)
tn_race$ymin = c(0, head(tn_race$ymax, n=-1)) #Compute the bottom of each rectangle
tn_race$labelPosition = (tn_race$ymax + tn_race$ymin)/2 #Calculate label position
tn_race$label = paste0(tn_race$Race, '\n Percent: ', tn_race$Percent)


ggplot(tn_race, aes(ymax = ymax, ymin=ymin, xmax=4, xmin=3, fill=Race))+
  geom_rect()+
  coord_polar(theta = 'y')+
  xlim(c(2,4))+
  theme_void()+
  scale_fill_brewer(palette = "Set1")
```

### Confirmed Cases by Ethnicity
```{r}
#Get TN Data
tn_path = 'https://www.tn.gov/health/cedep/ncov.html'

tn_eth = tn_path %>% read_html() %>% html_nodes(xpath='/html/body/div[2]/div[2]/div[2]/div/div/div[3]/div/div/div/div/div[4]/div/div[1]/table') %>% html_table()

#Data Cleaning
tn_eth = tn_eth[[1]]
names(tn_eth) = c('Ethnicity', 'Count', 'Percent')
tn_eth$Ethnicity = as.factor(tn_eth$Ethnicity)
tn_eth$Count = as.numeric(gsub(",", '', tn_eth$Count))

# #Plot
# ggplot(data=tn_eth, aes(x=reorder(Ethnicity, Count), y=Count))+
#   geom_col(fill='#CC0000')+
#   xlab('')+
#   theme(panel.background = element_blank(), axis.line = element_line(), axis.text = element_text(face = 'bold', size = 12), axis.text.x = element_text(angle=15, hjust=1, size=12))+
#   geom_text(aes(label =paste(Percent,'%')), vjust=-0.2)

tn_eth$fraction = tn_eth$Count/sum(tn_eth$Count) #Compute percentages
tn_eth$ymax = cumsum(tn_eth$fraction) #compute the cumulative percentages (top of each rectangle)
tn_eth$ymin = c(0, head(tn_eth$ymax, n=-1)) #Compute the bottom of each rectangle
tn_eth$labelPosition = (tn_eth$ymax + tn_eth$ymin)/2 #Calculate label position
tn_eth$label = paste0(tn_eth$Ethnicity, '\n Percent: ', tn_eth$Percent)


ggplot(tn_eth, aes(ymax = ymax, ymin=ymin, xmax=4, xmin=3, fill=Ethnicity))+
  geom_rect()+
  coord_polar(theta = 'y')+
  xlim(c(2,4))+
  theme_void()+
  scale_fill_brewer(palette = "Set1")
```







About 
================================

**The Tennessee Coronavirus Dashboard**    
  
The sole intention of this Coronavirus dashboard is to provide a visual overview of the 2019 Novel COVID-19 as it relates to counties in Tennessee. The data is scraped from two different sources, and there are no guarantees on the accuracy of the data becaues of differences in numbers reported and reporting time. 

**Data**

Data for "Cases across time in most populous counties" is a concatenation of the [New York Times Coronavirus Data](https://github.com/nytimes/covid-19-data), which last updated at  `r counties %>% top_n(1, date) %>% pull(date) %>%  max() %>% format('%m-%d')`, and the [TN Department of Health](https://www.tn.gov/health/cedep/ncov.html), which updates daily at 2:00 PM CST.     

All current (snapshot) data is from the [TN Department of Health](https://www.tn.gov/health/cedep/ncov.html) only.  

One issue that arises from collecting data in this fashion are differences in the ways numbers are collected. The Tennessee department of health may be slightly behind other counties or report numbers differently. The most noticeable difference is a consistent 'downard' for the latest date in the "Cases across time" figure for Davidson County. The NYT acquires their county level data from the counties directly, and the [Nashville/Davidson County](https://www.asafenashville.org/updates/) updates are consistently greater than the TN department of health reports. This could be because the Davidson County data is not separated by in vs out of state patients and/or differences in "Total" vs "Active Cases".  

Created by [Malle Carrasco-Harris](https://www.linkedin.com/in/malle-carrasco-harris).